Minimal Resources for Arabic Parsing: an Interactive Method for the Construction of Evolutive Automata
نویسندگان
چکیده
We present scenarii showing the interactive construction of operators. Some grammars and their progressive refinements through the “feedback” method are given as an example: a kernel of grammars for retrieving quotations, a grammar reflecting a current syntactic operator. We also recall previously developed morphological parsers. These grammars are designed as Finite State Automata, part of them made deterministic for better performance, using the Sarfiyya software developed on purpose that allows many operations on FSA. Purely algorithmic, this approach uses minimal resources, is rather independent from lexicons, gives to the tool words a prominent place and bases parsing on surface structures. On the theoretical level, it aims at putting forward the specificity of Arabic language that allows to work without a lexicon (as a limit case) due to the high level of grammaticalization in this language. This work is thus of interest to the linguist who looks for the good balance between lexicon and grammar as well as to the specialist in cognitive sciences (duality between data and programs). On the practical level, this work aims at establishing a coherent methodology for the creation of multipurpose searching operators. Information retrieval and reported discourse As far as IR is concerned, quotations and reported discourse are undoubtedly of great use to exploit large corpora. Their automatic retrieval raises a good number of questions in Arabic corpora. We have chosen this topic and linked it to related grammars involved in solving the IR of reported discourse. Namely the syntactic structures of this type of discourse, involving specific and fundamental operators; second, a specific structure often linked with this type of discourse and showing a judgmental point of view of the speaker (the “min al-” structure). These grammars are as we can see underlain by syntax and they represent respectively a solution to various types of questions: information retrieval, syntax and morphology. This paper will deal essentially in presenting scenarii showing interactive construction of operators. In many cases it is more appropriate to speak of indirect reported discourse. This holds especially true for newspapers which will be our main source for this study. Since indirect speech is much more frequent in these papers than direct reporting of what someone said in another context. Besides, quotations and indirect speech, if they share certain elements, do not entail the same syntactic structures, as we will see below. Reported discourse involves a speaker, the discourse he is said to have uttered and someone who reports it, usually in given circumstances. As we can see, there are numerous marks/external/material signs that can help detect an indirect reported discourse. For instance proper names, whether that of the author of the discourse or the one who reports it. Both of them can be accompanied by their title or quality or position. For instance: نلعأ روتكدلا راتخم باطخ ريزو عاطق لامعألا ماعلا هنا ىرجي ايلاح ذيفنت تاءارجالا عيبل ةيئاهنلا Dr. Mukhtar Khattab, Minister of Public Works, declared that the final procedures for the selling of ... are now carried out Unfortunately, proper names in general are a source of difficulties and have to be put in a lexicon. In the type of grammars we designed, they can be analyzed as a common noun if they have an Arabic root, or as a silence, if they are foreign. In this case, however, the silence can be very useful to reveal a proper name be it of a person, or a country. Titles also can be put in a lexicon and be used to delineate the parts of discourse quoted. As for the circumstances of the discourse or that of the report, they can be exploited towards our goal. One of the issues we have to address also is punctuation which is deficient and cannot be depended upon for its lack of unification/uniformity/homogenization. Quotation marks which could be of great use for pinpointing quotations are not dependable. Fortunately, indirect discourse does not entail the use of quotes. From our line of action, the reader will gather that we adopted a surface approach, which guided the design of our grammars. By this we mean a number of things : 1that these grammars are based on the morphology of the Arabic noun, without the introduction of a lexicon, 2that consequently they account for the rules of morphology in a minimal approach. 3We chose to represent these rules by automata. Because, thanks to automata, a remarkable conciseness of Arabic morphological data representations is made possible, and that, conversely it reflects the very nature of Arabic. On the methodological level, this approach is different from other contemporary approaches: purely algorithmic, it uses minimal resources, is independent from lexicons, gives the tool words a prominent role and bases parsing on surface structures. The most prominent character to reveal indirect reported discourse is the syntactic features it involves. To summarize: the type of verbs introducing first, the conjunction they govern and finally the preposition they are construed with.
منابع مشابه
BL-general fuzzy automata and minimal realization: Based on the associated categories
The present paper is an attempt to study the minimal BL-general fuzzy automata which realizes the given fuzzy behavior. Of two methods applied for construction of such automaton presented here, one has been based on Myhill-Nerode's theory while the other has been based on derivatives of the given fuzzy behavior. Meanwhile, the categories of BL-general fuzzy automata and fuzzy behavior, along wi...
متن کاملMINIMAL AND STATEWISE MINIMAL INTUITIONISTIC GENERAL L-FUZZY AUTOMATA
In this note, by considering the notions of the intuitionistic general L-fuzzy automaton and $(alpha, beta)$-language, we show that for any $(alpha, beta)$-language $mathcal{L}$, there exists a minimal intuitionistic general L-fuzzy automaton recognizing $mathcal{L}$.We prove that the minimal intuitionistic general L-fuzzy automaton is isomorphic with threshold $(alpha,beta)$ to any $(alpha, be...
متن کاملImproving Agent Performance for Multi-Resource Negotiation Using Learning Automata and Case-Based Reasoning
In electronic commerce markets, agents often should acquire multiple resources to fulfil a high-level task. In order to attain such resources they need to compete with each other. In multi-agent environments, in which competition is involved, negotiation would be an interaction between agents in order to reach an agreement on resource allocation and to be coordinated with each other. In recent ...
متن کاملI-homomorphism for BL-I-General L-fuzzy Automata
Taking into account the notion of BL-general fuzzy automaton, in the present study we define the notation of BL-intuitionistic general L-fuzzy automaton and I-bisimulation for BL-intuitionistic general L-fuzzy automaton.Then for a given BL-intuitionistic general L-fuzzy automaton, we obtain the greatest I-bisimulation. According to this notion, we give the structure of quotient BL-intuiti...
متن کاملBisimulation for BL-general fuzzy automata
In this note, we define bisimulation for BL-general fuzzy automata and show that if there is a bisimulation between two BL-general fuzzy automata, then they have the same behavior.For a given BL-general fuzzy automata, we obtain the greatest bisimulation for the BL-general fuzzy automata. Thereafter, if we use the greatest bisimulation, then we obtain a quotient BL-general fuzzy automata and th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009